Skip to content

feat(secrets): add --all-repos and --source via Pipeline Preview discovery#624

Merged
jamesadevine merged 12 commits into
mainfrom
devinejames/secret-setting
May 19, 2026
Merged

feat(secrets): add --all-repos and --source via Pipeline Preview discovery#624
jamesadevine merged 12 commits into
mainfrom
devinejames/secret-setting

Conversation

@jamesadevine
Copy link
Copy Markdown
Collaborator

@jamesadevine jamesadevine commented May 18, 2026

Summary

Adds project-scope token management to ado-aw via two new flags on the secrets subcommands:

  • --all-repos ÔÇö operate on every ado-aw pipeline ADO knows about in the project (direct ado-aw definitions and consumer pipelines that include ado-aw templates), regardless of which repo their root YAML lives in.
  • --source <path> ÔÇö filter to consumers of one specific template, e.g. agents/security-scan.md.

Composing these solves the user-visible pain that motivated the work:

# Rotate GITHUB_TOKEN on every consumer of one template, anywhere in the project:
ado-aw secrets set GITHUB_TOKEN --source agents/security-scan.md --all-repos

The legacy lexical local-fixture matcher remains the default; --definition-ids remains the explicit-ID escape hatch. enable / disable / remove keep their existing source-scoped safety semantics and are intentionally not changed.

How it works

Two pieces of new infrastructure make the above possible:

  1. Always-on marker step (src/compile/extensions/ado_aw_marker.rs). Every compiled pipeline now carries a # ado-aw-metadata: {schema,source,version,target} JSON line inside an Agent-job prepare-phase bash step.

    Why a bash step (and not a top-of-file comment): ADO's Pipeline Preview API strips top-of-document comments during YAML expansion ÔÇö verified empirically against live def 2434 in msazuresphere/4x4 ÔÇö but preserves comments inside step bodies (80 such lines survived the spike). The marker has to live inside a step body to survive Preview-driven discovery.

    Uniform across all four targets (standalone / 1es / job / stage); no per-target placement special-casing. Implemented via the existing CompilerExtension::prepare_steps hook (NOT setup_steps — that would spin up an extra pool agent just for a metadata comment) so it cleanly slots in alongside GitHubExtension and SafeOutputsExtension in collect_extensions.

  2. Preview-driven discovery (src/ado/discovery.rs). New discover_ado_aw_pipelines(scope) enumerates project definitions, calls POST /_apis/pipelines/{id}/preview per definition (8-permit semaphore, env-tunable via ADO_AW_PREVIEW_CONCURRENCY), and scans the response's finalYaml for marker steps via the existing crate::detect::parse_marker_step. Definitions whose process.yamlFilename matches a local lock file take a fast path that parses the local header directly and skips the Preview call.

    DiscoveryStatus classifies each definition as Direct / Consumer / UnknownRequiredParams / UnknownForbidden / PreviewFailed(_) / NotAdoAw. Pure scope filtering and classification logic are factored out for unit-testing without HTTP.

Files

  • src/compile/extensions/ado_aw_marker.rs (new) ÔÇö the always-on marker extension.
  • src/ado/discovery.rs (new) ÔÇö Preview client, DiscoveryScope, DiscoveryStatus, discover_ado_aw_pipelines, adapters into MatchedDefinition.
  • src/detect.rs ÔÇö adds MarkerMetadata + parse_marker_step. Existing # @ado-aw parser unchanged.
  • src/compile/extensions/mod.rs ÔÇö registers AdoAwMarkerExtension as always-on; CompileContext gains input_path.
  • src/compile/{standalone,onees,common}.rs ÔÇö pass input_path to CompileContext::new; 1ES generate_setup_job now routes extension setup_steps (was previously the only target that didn't).
  • src/compile/types.rs ÔÇö CompileTarget::as_str().
  • src/compile/common.rs ÔÇö extracted normalize_source_path helper shared by header comment and marker extension.
  • src/ado/mod.rs ÔÇö DefinitionSummary gains repository + revision; new Repository struct; new MatchMethod::Discovery.
  • src/secrets.rs, src/main.rs ÔÇö --all-repos / --source flag plumbing; resolve_for_command chooses between legacy and discovery paths.
  • docs/cli.md ÔÇö documents the new flags and the project-scope-discovery section.
  • Test helper updates in src/enable.rs / src/list.rs (extending the def(...) helpers for the new DefinitionSummary fields).

Empirical grounding

Used az-acquired bearer auth to run a live spike against msazuresphere/4x4 def 2434 (OS Release Readiness) before settling on the marker design. The spike confirmed:

  • POST /_apis/pipelines/2434/preview?api-version=7.1-preview.1 works end-to-end and returns 56 KB of expanded YAML in finalYaml.
  • The source file's leading # This file is auto-generated by ado-awÔǪ and # @ado-aw source="ÔǪ" version=ÔǪ are not in the expanded YAML (header is stripped during expansion).
  • 80 # comment lines inside step bodies are preserved verbatim.

That finding mandated moving the marker from a top-of-file comment into a step body, which this PR implements.

Test plan

  • cargo test: 1739 passed, 0 failed.
  • cargo clippy --all-targets --all-features: zero new warnings (only pre-existing baseline warnings remain).
  • 6 unit tests for parse_marker_step (single / multiple / malformed JSON / forward-compat unknown fields / no-match / prefix-only-without-JSON).
  • 4 integration tests in tests/compiler_tests.rs assert the marker step appears with correct source / version / target / schema for every target.
  • 12 unit tests in src/ado/discovery.rs for scope filtering, Direct/Consumer classification, and the local-lock fast-path lookup table.
  • Live secrets list --definition-ids 2434 against msazuresphere/4x4 confirms the legacy path still works.
  • Live secrets list --all-repos --source agents/release-readiness.md --debug confirms the discovery path correctly fans out POST /_apis/pipelines/{id}/preview calls to every definition in the project. (Returns no match for def 2434 because that definition was compiled before this PR; resolves once the dogfood recompile follow-up lands.)

Follow-up

Pre-marker definitions remain findable only via the process.yamlFilename fast-path until they're recompiled with this PR's ado-aw. A separate PR will run ado-aw compile against every dogfood pipeline and push the regenerated lock files. Tracked as a known follow-up; no flag day required because the legacy match path stays as a fallback.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid overall — the three-layer design (marker extension → parse_marker_step → discovery) is clean and well-tested. A few concrete issues worth addressing before merge.

Findings

🐛 Bugs / Logic Issues

  • src/ado/discovery.rs:441-444is_direct_match logic inversion

    if markers.len() != 1 {
        return markers.is_empty(); // returns true for 0 markers!
    }

    When called with 0 markers this returns true, which would classify a non-ado-aw definition as Direct. The current callsite in classify_definition guards against it, but this is a latent bug waiting for the next caller. Should be return false — 0 markers means "not ado-aw", not "direct match".

  • src/ado/discovery.rs:502-513 — doc comment contradicts implementation for UnknownRequiredParams
    The doc comment says:

    UnknownRequiredParams is propagated with a marker-less MatchedDefinition because the user explicitly opted in and may want to act on it.

    But the code returns None for UnknownRequiredParams (same arm as NotAdoAw, UnknownForbidden, PreviewFailed). Either the behavior needs to match the comment (propagate it) or the comment is stale. Silently dropping UnknownRequiredParams definitions means a user running secrets set --all-repos gets no indication that some pipelines were skipped because they require parameters — which is surprising.

  • src/ado/discovery.rs:563-575repo_url() missing percent-encoding

    Some(format!("{}/{}/_git/{}", self.org_url.trim_end_matches('/'), self.project, self.repo_name))

    ADO project names can contain spaces (e.g. "My Project"). The API returns repository.url as a URL where spaces are percent-encoded (My%20Project), but self.project stores the decoded string. After normalize_repo_url lowercases both sides, "my project" still won't equal "my%20project". DiscoveryScope::CurrentRepo (which is the scope when --source is used alone) would silently return no results for any project with spaces in its name. The rest of the codebase uses percent_encoding::utf8_percent_encode(&ctx.project, PATH_SEGMENT) for URL construction — same fix needed here.

🔒 Security Concerns

  • src/compile/extensions/ado_aw_marker.rs:966-975 — single-quote injection in generated echo step
    The echo line uses single-quoting around the user-controlled source value:
    echo 'ado-aw metadata: source={source} version={version} target={target}'
    If a source path contains a ' (e.g. agents/foo's-agent.md), the generated bash is syntactically broken. normalize_source_path strips " but not '. Since this is a compiler-generated step for arbitrary user-provided paths, normalize_source_path should also strip or escape single quotes — or the echo should use double-quoting with the JSON-safe form.

⚠️ Suggestions

  • src/detect.rs:1420,1428,1459 — stale #[allow(dead_code)] annotations
    MARKER_STEP_PREFIX, MarkerMetadata, and parse_marker_step are all actively consumed by src/ado/discovery.rs which this PR wires in. The "upcoming Preview-driven discovery" comments are no longer accurate. These attrs can be removed.

  • src/secrets.rs:1762 — fast-path bypassed in discovery
    resolve_definitions_via_discovery is called with local_lock_paths: None, meaning even definitions with a matching local lock file will spend a Preview API round-trip. If the caller has a local path available (e.g. via --source pointing to a local file), threading it through would save calls for direct pipelines. Not a correctness issue, but could matter in large projects.

✅ What Looks Good

  • The parse_marker_step design (line-by-line scan, skip-malformed, forward-compatible via #[serde(deny_unknown_fields)]-free deserialization) is exactly right for an embedded-in-YAML marker format.
  • The 8-permit semaphore with ADO_AW_PREVIEW_CONCURRENCY override is thoughtful — big project operators have an escape hatch without a code change.
  • The build_lock_path_map / normalize_ado_yaml_path key alignment is correct and tested.
  • discovered_to_matched keeping the rest of the CLI commands unchanged is a clean adapter pattern.
  • Test coverage is thorough — 6 unit tests for the parser, 4 integration tests across all 4 targets, 12 unit tests for scope/classification logic.

Generated by Rust PR Reviewer for issue #624 · ● 1.1M ·

jamesadevine added a commit that referenced this pull request May 18, 2026
- `is_direct_match`: return `false` (not `markers.is_empty()`) for the
  marker-less case. The previous code returned `true` for 0 markers,
  which would misclassify a non-ado-aw definition as `Direct` if any
  future caller hit that path. Belt-and-braces — the current callsite
  in `classify_definition` already guards against it.

- `discovered_to_matched`: doc-comment claimed `UnknownRequiredParams`
  was propagated; implementation drops it. Keep the safer drop
  behaviour (we can't act on a marker-less definition) but surface a
  `warn!` summary from `resolve_definitions_via_discovery` so
  `secrets set --all-repos` operators can see when pipelines were
  skipped because of required-template-parameters / 403 / other
  preview failures. Doc comment updated to match.

- `AdoContext::repo_url`: percent-encode `project` and `repo_name` so
  `DiscoveryScope::CurrentRepo` works for projects whose names contain
  reserved characters (e.g. spaces). The lowercase normalize step
  can't reconcile a decoded local name with the encoded form ADO
  returns in `repository.url`.

- `AdoAwMarkerExtension`: bash-quote-escape the source path embedded
  in the runtime `echo` line. A markdown filename containing `'`
  (e.g. `agents/foo's.md`) would otherwise produce syntactically
  broken bash. New `bash_single_quote_escape` helper applies the
  canonical `'\''` idiom; the JSON marker line keeps the raw value
  because JSON has no quoting concern with `'`. Two new tests cover
  the idiom and a `foo's-agent.md` path.

- `src/detect.rs`: drop the now-stale `#[allow(dead_code)]` attrs on
  `MARKER_STEP_PREFIX`, `MarkerMetadata`, and `parse_marker_step`.
  All three are actively consumed by `src/ado/discovery.rs`.

- `resolve_for_command`: thread local-lock-file paths into discovery
  so the `process.yamlFilename` fast-path can skip Preview calls for
  locally-compiled pipelines. Best-effort scan — failures fall back
  to Preview-for-everything cleanly.

All 1741 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid, well-tested feature addition — two real issues worth addressing before merge.

Findings

🔒 Security Concerns

  • src/compile/extensions/ado_aw_marker.rs:1021-1029 — The echo line in the generated bash step embeds the user-controlled source path inside a single-quoted string: echo 'ado-aw metadata: source={echo_source} ...'. The bash_single_quote_escape correctly handles ' quoting, but if a file is named something like agents/##vso[task.setvariable variable=FOO]value.md, the ##vso[ sequence survives into stdout verbatim, and the ADO agent's logging-command scanner would process it as a task command. normalize_source_path strips " and \n/\r but not ##vso[. Suggest adding a ##vso[ strip/replace in normalize_source_path (or directly before embedding in the echo) — consistent with the sanitisation already applied to other user-facing YAML surfaces.

🐛 Bugs / Logic Issues

  • src/ado/discovery.rs:465-492 (resolve_definitions_via_discovery) — The skip-counters (skipped_required_params, skipped_forbidden, skipped_failed) are incremented inside the .filter() closure before the source_filter check. When --source agents/foo.md is active, definitions with UnknownRequiredParams / Forbidden / PreviewFailed are counted even if they have no markers and are wholly unrelated to the requested template. The resulting warnings ("Discovery skipped N definitions whose Pipeline Preview requires templateParameters...") will mislead users into thinking they might be missing consumers of agents/foo.md when those N definitions were unrelated pipelines that simply failed. Fix: only count a definition toward skip totals when either source_filter is None, or when the definition does reference the target source (i.e., filter first, count second — or two-pass).

⚠️ Suggestions

  • src/ado/discovery.rs:38#![allow(dead_code)] with the comment "Wired into CLI commands in workstream S" is stale: resolve_definitions_via_discovery and discovered_to_matched are both called from secrets.rs in this PR. The file-level suppression silently hides any future dead-code regressions. Remove it (or narrow to only the pub async fn preview_pipeline if that's genuinely only used in tests for now).

  • src/main.rs:98-108 / src/secrets.rs:197-198--source alone (without --all-repos) silently activates DiscoveryScope::CurrentRepo. The CLI help text says "Activates the discovery code path; pairs with --all-repos..." but doesn't explain that without --all-repos it scopes only to the current repo's definitions. A user who runs secrets set GITHUB_TOKEN --source agents/foo.md and gets zero results because they're in a different repo will be confused. Consider adding [only searches definitions in the current repository; add --all-repos to search the full project] to the help string.

✅ What Looks Good

  • The empirical grounding (live spike against def 2434) and its impact on the architectural decision (marker in step body, not top-of-file comment) is excellent engineering rigour — well documented in both the PR description and the module doc comments.
  • bash_single_quote_escape + its dedicated test covering the '\'' idiom is exactly the right defensive approach; good test for single-quote in a path.
  • parse_marker_step is robust: trims leading whitespace, skips non-JSON hits, ignores unknown future fields via serde defaults. The six unit tests cover the edge cases well.
  • Fast-path short-circuit (local lock file → skip Preview call) with graceful fall-through when the file doesn't parse is a clean performance/resilience trade-off.
  • discovered_to_matched correctly gates writes to only Direct | Consumer statuses, preventing accidental action on ambiguous (UnknownRequiredParams, Forbidden) definitions.

Generated by Rust PR Reviewer for issue #624 · ● 771.9K ·

jamesadevine added a commit that referenced this pull request May 18, 2026
- `AdoAwMarkerExtension`: neutralise `##vso[` and `##[` logging-command
  prefixes in the source path before embedding it in the marker step's
  runtime `echo` line. Without this, a markdown filename like
  `agents/##vso[task.setvariable variable=FOO]value.md` would echo a
  literal `##vso[...]` sequence that the ADO build agent's stdout
  scanner treats as a task command — a logging-command-injection
  primitive any attacker controlling a filename could trigger. New
  `sanitize_for_vso_logging` helper mirrors the existing convention in
  `crate::agent_stats::sanitize_for_markdown` (`[vso-filtered][` /
  `[filtered][`). The `# ado-aw-metadata:` JSON line keeps the raw
  value (it's a YAML comment, not echoed to stdout). Two new tests:
  the sanitiser unit test and an end-to-end attack-payload roundtrip
  asserting the echo line is neutralised.

- `resolve_definitions_via_discovery`: the previous skip-counter
  implementation counted `UnknownRequiredParams` / `Forbidden` /
  `PreviewFailed` failures *before* applying `source_filter`, so under
  `--source agents/foo.md` the warnings would tell the user "N
  definitions skipped requiring template parameters" for definitions
  that had nothing to do with `agents/foo.md`. Split the counting:

  * without `--source`: per-status counts are honest (we're operating
    on every ado-aw pipeline) and the existing three warnings stand;
  * with `--source`: a single conservative `uninspectable` counter,
    surfaced as one warning that explicitly acknowledges we can't tell
    whether any of those skipped definitions would have been consumers
    of the filtered template.

- `src/ado/discovery.rs`: drop the file-level `#![allow(dead_code)]`.
  `resolve_definitions_via_discovery` and `discovered_to_matched` are
  now wired into `secrets.rs`; the suppression was hiding future
  dead-code regressions. Build is clean without it.

- `src/main.rs` (`SecretsCmd`): clarified `--source` help text — calls
  out that **without `--all-repos`, only the current repository is
  searched**. Saves the user-confusion case "I ran `secrets set
  GITHUB_TOKEN --source agents/foo.md` and got zero results" when
  they're in a different repo than the consumer pipelines.

All 1743 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid design with good test coverage — a few targeted fixes needed before merge.

Findings

🐛 Bugs / Logic Issues

  • src/ado/discovery.rs:551source_filter is not normalized before comparison

    The marker's source field is stored in normalized form (via normalize_source_path: strips ./, converts \ to /, strips newlines). But the value from --source is passed through as-is:

    Some(src) => d.markers.iter().any(|m| m.source == src),

    A user passing --source ./agents/foo.md or --source agents\foo.md (Windows) silently matches nothing instead of finding the expected pipeline. The fix is to call normalize_source_path(Path::new(src)) on the filter value before the comparison, matching the exact normalization applied at compile time.

  • src/ado/mod.rs:222 — misleading revision doc comment

    The field comment says "Used as a cache key by Preview-driven discovery so an unchanged definition is only previewed once per session", but discover_ado_aw_pipelines never reads revision and no caching logic exists anywhere in the PR. The comment will mislead future contributors looking for the caching logic or trying to add it. Either remove the caching claim or add a // TODO: implement revision-keyed caching note.

  • src/ado/discovery.rs:69DiscoveryScope::Explicit is dead code

    The variant is defined, handled in apply_scope_filter, and exercised in unit tests, but no callsite ever constructs it. The --definition-ids path is handled entirely by the legacy resolve_definitions function before DiscoveryScope is ever created. Clippy may not catch this because the unit tests construct it — but it should at least get a comment (// Reserved for future use) or a #[cfg(test)] annotation on the test-only construction to make the intent explicit.

⚠️ Suggestions

  • src/ado/discovery.rs:373PreviewError::NotFound maps to PreviewFailed

    A 404 from the Preview endpoint almost certainly means the definition was deleted in the window between list_definitions and preview_pipeline. It'll be emitted as a skipped_failed warning under --all-repos, confusing operators who'll see counts go up and then wonder why. Silently dropping 404s (or mapping them to a distinct DiscoveryStatus::NotFound that's excluded from the warning) would be more signal-correct.

  • src/secrets.rs:41--source without --all-repos with no git remote gives a generic error

    When a user runs ado-aw secrets set KEY --source agents/foo.md outside a git repo, repo_url() returns None, apply_scope_filter returns an empty vec, and the user gets "No ado-aw pipelines found via Preview-driven discovery." — with no hint that the scoping culprit is the missing git remote. A targeted message like "Could not determine current repo URL; try adding --all-repos to search the full project" would be much friendlier.

✅ What Looks Good

  • The marker-step approach is well-motivated and the empirical grounding (verifying ADO's Preview API strips top-of-file comments but preserves step body comments) is exactly the right level of rigor for a design decision like this.
  • Security handling on the echo line is thorough: sanitize_for_vso_logging covers ##vso[ and ##[, bash_single_quote_escape uses the canonical '\'' idiom, and both have dedicated unit tests including an injection attempt.
  • The parse_marker_step forward-compat behavior (unknown JSON fields silently ignored, malformed entries logged at debug and skipped rather than panicking) is the right default for a discovery surface that'll outlive compiler versions.
  • The fast-path (local lock file → skip Preview HTTP call) is a nice latency win for the common case and the fallback-to-Preview design is sound.
  • The 8-permit semaphore with ADO_AW_PREVIEW_CONCURRENCY env override is thoughtful — large projects won't accidentally hammer the ADO rate limiter.

Generated by Rust PR Reviewer for issue #624 · ● 736.3K ·

jamesadevine added a commit that referenced this pull request May 18, 2026
- `resolve_definitions_via_discovery`: normalize the user-supplied
  `--source` value through `normalize_source_path` before comparing
  against `marker.source`. The marker stores the canonical form
  (`agents/foo.md`), so without normalization a user typing
  `--source ./agents/foo.md` or `--source agents\foo.md` (Windows)
  silently matched nothing. Re-export `normalize_source_path` from
  `crate::compile` so callers outside the compile module tree can
  reach it cleanly. New test asserts the three common variants
  (canonical, leading-`./`, backslash) all produce the same normalized
  string.

- `classify_definition` / `DiscoveryStatus::NotFound`: 404 from the
  Preview endpoint almost certainly means the definition was deleted
  between `list_definitions` and `preview_pipeline` (TOCTOU race).
  Previously routed through `PreviewFailed`, which inflated
  `skipped_failed` counts and confused operators. New
  `DiscoveryStatus::NotFound` variant is excluded from skip-warning
  counters in `resolve_definitions_via_discovery` and dropped by
  `discovered_to_matched` like the other non-actionable statuses.
  Debug-logged with the definition id+name so `--debug` users can
  still see what happened.

- `DefinitionSummary::revision`: doc comment claimed Preview-driven
  discovery uses it as a cache key, but no caching is implemented.
  Rewrote to say it's deserialised for a future cache and there is
  *no* caching yet, with a clear "see the discovery module for current
  behaviour" pointer.

- `DiscoveryScope::Explicit`: clarified the docstring to call out
  that no production callsite constructs this variant — `--definition-ids`
  uses the legacy `resolve_definitions` path before discovery ever
  runs. Variant is kept (not removed and not `#[cfg(test)]`-gated)
  because direct API consumers may want to feed pre-filtered IDs into
  discovery; the existing unit-test construction stays.

- `secrets::resolve_for_command`: bail early with a targeted error
  when `--source` is used without `--all-repos` outside a recognised
  ADO repo. The previous behaviour was a generic "No ado-aw pipelines
  found via Preview-driven discovery" message that gave no hint that
  the empty result was caused by the missing git remote. New error
  spells out the cause and suggests `--all-repos`.

All 1746 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid implementation — the security work is careful and the architecture is well-thought-out. A few items worth discussing before merge.

Findings

🐛 Bugs / Logic Issues

  • src/ado/discovery.rs:354normalize_repo_url doesn't normalize percent-encoding vs literal spaces. repo_url() percent-encodes project/repo names before the comparison, but if ADO's REST API ever returns a repository.url with spaces decoded (e.g. my project instead of my%20project), CurrentRepo filtering silently drops everything and --source without --all-repos returns zero results with no helpful error. The comment claims this was verified against one live endpoint; it's a latent assumption worth documenting as such, or adding a decode→reencode round-trip to make the comparison encoding-independent.

  • src/ado/discovery.rs:565–571 — Consumer yaml_path is silently first-wins. discovered_to_matched sets yaml_path: d.markers.first().map(|m| m.source.clone()).unwrap_or_default(). For a Consumer that includes two templates (agents/a.md and agents/b.md), ado-aw list will consistently print → agents/a.md and silently omit agents/b.md. For secrets set, this means the displayed source reference in the summary may mislead the operator into thinking the variable was set only for one template when it was actually set on a pipeline that consumes both. The PR description acknowledges a richer surface is deferred, but the truncation isn't called out in the printed summary.

🔒 Security Concerns

  • src/compile/extensions/ado_aw_marker.rs:1183 + src/ado/discovery.rs:565##vso[ sanitization stops at the echo line. The echo correctly applies sanitize_for_vso_logging, but the raw source path flows unsanitized from the JSON comment → parse_marker_stepMarkerMeta(redacted) → MatchedDefinition::yaml_pathprint_matched_summary(which writes to stdout). Ifado-aw secrets set --all-reposis ever called from an ADO pipeline step (e.g., in an operator automation pipeline), a crafted filename like##vso[task.setvariable variable=MY_SECRET]x.md` would pass through the JSON comment (bash comment, harmless there) but then surface in the CLI's own stdout, where the ADO agent would scan it. The threat is non-zero whenever the CLI is run inside a pipeline. Given the current operator-facing usage this is low risk, but it's architecturally inconsistent — the echo sanitizes but the downstream data flow doesn't.

⚠️ Suggestions

  • src/ado/discovery.rs:282–286ADO_AW_PREVIEW_CONCURRENCY=0 is silently clamped. The .max(1) guard is correct and prevents a deadlock, but there's no warning when the user's env value parses to 0. A warn! or debug! log when the env value parses to a u64 but clamps to 1 would save future head-scratching.

  • src/ado/discovery.rs — no test for --source with CurrentRepo scope when repository.url is absent. test_scope_current_repo_excludes_definitions_without_repository exists, but there's no test verifying that resolve_for_command bails with the helpful "no ADO git remote detected" message when repo_name is empty and source_filter is set. Currently that bail lives in secrets.rs and the discovery module only sees the scope enum — the integration path is untested.

✅ What Looks Good

  • Security layering in the marker step is solid: sanitize_for_vso_logging + bash_single_quote_escape on the echo line, serde_json::json! for the JSON comment (properly encodes ", \, and special chars), and normalize_source_path strips \n/\r before anything is emitted. The attack paths for VSO command injection and bash quoting are well-covered where it matters most (the runtime surface).
  • The DiscoveryStatus::NotFound variant correctly handles the TOCTOU race (definition deleted between list and preview) without surfacing a noisy error or inflating the skip counters.
  • parse_marker_step is defensively written: trims leading whitespace (handles indented heredoc content), skips malformed JSON silently, accepts unknown fields via serde defaults, and is exercised by all the right edge cases.
  • Semaphore permits.max(1) prevents a deadlock if a user sets ADO_AW_PREVIEW_CONCURRENCY=0.
  • Fast-path lock-file lookup correctly normalizes paths (trim_start_matches('/') aligns with normalize_ado_yaml_path) so the optimization fires correctly in practice.
  • CompileContext::new signature change (takes input_path instead of input_dir) is a nice cleanup — less footgun surface, and the parent().unwrap_or(".") fallback is now encapsulated in one place.

Generated by Rust PR Reviewer for issue #624 · ● 1.6M ·

jamesadevine added a commit that referenced this pull request May 18, 2026
- `normalize_repo_url`: percent-decode before comparing, so a project
  named "My Project" matches whether ADO returns the encoded form
  (`My%20Project`) or the decoded form. The previous implementation
  assumed ADO always returns percent-encoded URLs; that assumption is
  documented in code now and the comparison is encoding-independent.
  New unit tests cover the encoded/decoded equivalence and the
  case-insensitive/trailing-slash behaviour.

- `discovered_to_matched`: stop silently truncating consumers that
  include multiple ado-aw templates. The `yaml_path` field used by
  `print_matched_summary` now joins every marker source with `, ` so
  e.g. `agents/a.md, agents/b.md` shows up honestly in the CLI
  summary. New unit test asserts both sources are surfaced.

- `##vso[` defence-in-depth: the marker step's runtime echo already
  neutralises `##vso[` and `##[` prefixes, but the same raw source
  string was flowing through `MarkerMetadata` -> `MatchedDefinition::yaml_path`
  -> `print_matched_summary` (which writes to stdout). When the CLI
  is invoked from inside an ADO pipeline step, the agent's stdout
  scanner would still pick up an attacker-controlled `##vso[...]`
  payload. New `sanitize_for_vso_logging` helper in the discovery
  module applies the same convention (`##vso[` -> `[vso-filtered][`,
  `##[` -> `[filtered][`) when building the `yaml_path`. New unit test
  asserts the sanitisation.

- `ADO_AW_PREVIEW_CONCURRENCY=0` now emits a `warn!` before clamping
  to 1, instead of silently masking the typo. Operators who set `=0`
  will see the warning and can correct the env value rather than
  wondering why their concurrency tuning had no effect.

- New unit test for the `--source` + no-git-remote bail in
  `secrets::resolve_for_command`: previously the helpful "no Azure
  DevOps git remote was detected; try --all-repos" error path was
  untested. Now asserted via a `tokio::test` that constructs an empty
  AdoContext and verifies the error message contains both the cause
  and the suggested mitigation.

All 1753 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — solid architecture, excellent test coverage, and careful attention to security. A few maintainability notes worth addressing before merge.

Findings

⚠️ Suggestions

  • [src/compile/extensions/ado_aw_marker.rs + src/ado/discovery.rs] sanitize_for_vso_logging is duplicated

    The function is defined privately and identically in both new files:

    fn sanitize_for_vso_logging(s: &str) -> String {
        s.replace("##vso[", "[vso-filtered][")
            .replace("##[", "[filtered][")
    }

    src/sanitize.rs already handles VSO neutralisation (for markdown context), and src/agent_stats.rs has a sanitize_for_markdown with similar intent. If the set of dangerous prefixes ever needs to expand, this must be updated in two independent places with no compiler enforcement. Worth extracting to src/sanitize.rs as a pub function and importing from both call sites.

  • [src/ado/discovery.rs:648-709] Side-effecting .filter() closure is confusing

    In resolve_definitions_via_discovery, the .filter() closure simultaneously decides inclusion and mutates four skip counters as a side effect. In the unfiltered case (normalized_filter.is_none()), it always returns true, so PreviewFailed/UnknownForbidden/UnknownRequiredParams items all land in kept and are only discarded later by discovered_to_matched. The variable name kept then misleads — it holds items that are about to be dropped. A two-pass approach (classify then warn then convert) or a manual for loop with explicit continue/push would be clearer and make it obvious that the warning counts and the final filtered set are derived from the same pass.

  • [src/ado/discovery.rs:474-508] is_direct_match silently misclassifies sources with non-.md suffixes

    let Some(stem) = marker.source.strip_suffix(".md").map(|s| format!("{s}.lock.yml")) else {
        return false;
    };

    If marker.source doesn't end in .md (e.g. a future extension allows .yaml-sourced agents), the function returns false and the definition is classified as Consumer instead of Direct. The test fixture names document the convention, but there's no defensive log or comment explaining why a non-.md source is a legitimate case for returning false vs a marker corruption. Worth a // non-standard source extension; treat conservatively as Consumer comment at minimum.

✅ What Looks Good

  • Security handling is thorough: both bash_single_quote_escape and sanitize_for_vso_logging are applied to the echo line before it reaches the generated bash. The JSON marker itself uses serde_json serialisation, so ", \n, and other special characters in source paths are correctly escaped. Injection tests exist for both vectors.
  • Forward-compat serde on MarkerMetadata: #[serde(default)] on all fields means a schema-2 marker with new fields won't break the current parser, and a schema-1 parser encountering schema-2 output degrades gracefully.
  • Fast-path for local lock files is a smart optimisation — skipping a network round-trip per already-compiled definition in the common case, while falling through to Preview as a defensive measure if the local parse fails.
  • ADO_AW_PREVIEW_CONCURRENCY=0 warning (clamp-with-warn instead of silent clamp) is the right UX choice.
  • Test coverage: 6 unit tests on the parser, 12 on discovery scope/classification logic, 4 integration tests for the marker across all four compile targets — comprehensive for a new infra feature.

Generated by Rust PR Reviewer for issue #624 · ● 1.4M ·

Copilot AI pushed a commit that referenced this pull request May 19, 2026
- `is_direct_match`: return `false` (not `markers.is_empty()`) for the
  marker-less case. The previous code returned `true` for 0 markers,
  which would misclassify a non-ado-aw definition as `Direct` if any
  future caller hit that path. Belt-and-braces — the current callsite
  in `classify_definition` already guards against it.

- `discovered_to_matched`: doc-comment claimed `UnknownRequiredParams`
  was propagated; implementation drops it. Keep the safer drop
  behaviour (we can't act on a marker-less definition) but surface a
  `warn!` summary from `resolve_definitions_via_discovery` so
  `secrets set --all-repos` operators can see when pipelines were
  skipped because of required-template-parameters / 403 / other
  preview failures. Doc comment updated to match.

- `AdoContext::repo_url`: percent-encode `project` and `repo_name` so
  `DiscoveryScope::CurrentRepo` works for projects whose names contain
  reserved characters (e.g. spaces). The lowercase normalize step
  can't reconcile a decoded local name with the encoded form ADO
  returns in `repository.url`.

- `AdoAwMarkerExtension`: bash-quote-escape the source path embedded
  in the runtime `echo` line. A markdown filename containing `'`
  (e.g. `agents/foo's.md`) would otherwise produce syntactically
  broken bash. New `bash_single_quote_escape` helper applies the
  canonical `'\''` idiom; the JSON marker line keeps the raw value
  because JSON has no quoting concern with `'`. Two new tests cover
  the idiom and a `foo's-agent.md` path.

- `src/detect.rs`: drop the now-stale `#[allow(dead_code)]` attrs on
  `MARKER_STEP_PREFIX`, `MarkerMetadata`, and `parse_marker_step`.
  All three are actively consumed by `src/ado/discovery.rs`.

- `resolve_for_command`: thread local-lock-file paths into discovery
  so the `process.yamlFilename` fast-path can skip Preview calls for
  locally-compiled pipelines. Best-effort scan — failures fall back
  to Preview-for-everything cleanly.

All 1741 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Copilot AI pushed a commit that referenced this pull request May 19, 2026
- `AdoAwMarkerExtension`: neutralise `##vso[` and `##[` logging-command
  prefixes in the source path before embedding it in the marker step's
  runtime `echo` line. Without this, a markdown filename like
  `agents/##vso[task.setvariable variable=FOO]value.md` would echo a
  literal `##vso[...]` sequence that the ADO build agent's stdout
  scanner treats as a task command — a logging-command-injection
  primitive any attacker controlling a filename could trigger. New
  `sanitize_for_vso_logging` helper mirrors the existing convention in
  `crate::agent_stats::sanitize_for_markdown` (`[vso-filtered][` /
  `[filtered][`). The `# ado-aw-metadata:` JSON line keeps the raw
  value (it's a YAML comment, not echoed to stdout). Two new tests:
  the sanitiser unit test and an end-to-end attack-payload roundtrip
  asserting the echo line is neutralised.

- `resolve_definitions_via_discovery`: the previous skip-counter
  implementation counted `UnknownRequiredParams` / `Forbidden` /
  `PreviewFailed` failures *before* applying `source_filter`, so under
  `--source agents/foo.md` the warnings would tell the user "N
  definitions skipped requiring template parameters" for definitions
  that had nothing to do with `agents/foo.md`. Split the counting:

  * without `--source`: per-status counts are honest (we're operating
    on every ado-aw pipeline) and the existing three warnings stand;
  * with `--source`: a single conservative `uninspectable` counter,
    surfaced as one warning that explicitly acknowledges we can't tell
    whether any of those skipped definitions would have been consumers
    of the filtered template.

- `src/ado/discovery.rs`: drop the file-level `#![allow(dead_code)]`.
  `resolve_definitions_via_discovery` and `discovered_to_matched` are
  now wired into `secrets.rs`; the suppression was hiding future
  dead-code regressions. Build is clean without it.

- `src/main.rs` (`SecretsCmd`): clarified `--source` help text — calls
  out that **without `--all-repos`, only the current repository is
  searched**. Saves the user-confusion case "I ran `secrets set
  GITHUB_TOKEN --source agents/foo.md` and got zero results" when
  they're in a different repo than the consumer pipelines.

All 1743 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Copilot AI pushed a commit that referenced this pull request May 19, 2026
- `resolve_definitions_via_discovery`: normalize the user-supplied
  `--source` value through `normalize_source_path` before comparing
  against `marker.source`. The marker stores the canonical form
  (`agents/foo.md`), so without normalization a user typing
  `--source ./agents/foo.md` or `--source agents\foo.md` (Windows)
  silently matched nothing. Re-export `normalize_source_path` from
  `crate::compile` so callers outside the compile module tree can
  reach it cleanly. New test asserts the three common variants
  (canonical, leading-`./`, backslash) all produce the same normalized
  string.

- `classify_definition` / `DiscoveryStatus::NotFound`: 404 from the
  Preview endpoint almost certainly means the definition was deleted
  between `list_definitions` and `preview_pipeline` (TOCTOU race).
  Previously routed through `PreviewFailed`, which inflated
  `skipped_failed` counts and confused operators. New
  `DiscoveryStatus::NotFound` variant is excluded from skip-warning
  counters in `resolve_definitions_via_discovery` and dropped by
  `discovered_to_matched` like the other non-actionable statuses.
  Debug-logged with the definition id+name so `--debug` users can
  still see what happened.

- `DefinitionSummary::revision`: doc comment claimed Preview-driven
  discovery uses it as a cache key, but no caching is implemented.
  Rewrote to say it's deserialised for a future cache and there is
  *no* caching yet, with a clear "see the discovery module for current
  behaviour" pointer.

- `DiscoveryScope::Explicit`: clarified the docstring to call out
  that no production callsite constructs this variant — `--definition-ids`
  uses the legacy `resolve_definitions` path before discovery ever
  runs. Variant is kept (not removed and not `#[cfg(test)]`-gated)
  because direct API consumers may want to feed pre-filtered IDs into
  discovery; the existing unit-test construction stays.

- `secrets::resolve_for_command`: bail early with a targeted error
  when `--source` is used without `--all-repos` outside a recognised
  ADO repo. The previous behaviour was a generic "No ado-aw pipelines
  found via Preview-driven discovery" message that gave no hint that
  the empty result was caused by the missing git remote. New error
  spells out the cause and suggests `--all-repos`.

All 1746 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Copilot AI pushed a commit that referenced this pull request May 19, 2026
- `normalize_repo_url`: percent-decode before comparing, so a project
  named "My Project" matches whether ADO returns the encoded form
  (`My%20Project`) or the decoded form. The previous implementation
  assumed ADO always returns percent-encoded URLs; that assumption is
  documented in code now and the comparison is encoding-independent.
  New unit tests cover the encoded/decoded equivalence and the
  case-insensitive/trailing-slash behaviour.

- `discovered_to_matched`: stop silently truncating consumers that
  include multiple ado-aw templates. The `yaml_path` field used by
  `print_matched_summary` now joins every marker source with `, ` so
  e.g. `agents/a.md, agents/b.md` shows up honestly in the CLI
  summary. New unit test asserts both sources are surfaced.

- `##vso[` defence-in-depth: the marker step's runtime echo already
  neutralises `##vso[` and `##[` prefixes, but the same raw source
  string was flowing through `MarkerMetadata` -> `MatchedDefinition::yaml_path`
  -> `print_matched_summary` (which writes to stdout). When the CLI
  is invoked from inside an ADO pipeline step, the agent's stdout
  scanner would still pick up an attacker-controlled `##vso[...]`
  payload. New `sanitize_for_vso_logging` helper in the discovery
  module applies the same convention (`##vso[` -> `[vso-filtered][`,
  `##[` -> `[filtered][`) when building the `yaml_path`. New unit test
  asserts the sanitisation.

- `ADO_AW_PREVIEW_CONCURRENCY=0` now emits a `warn!` before clamping
  to 1, instead of silently masking the typo. Operators who set `=0`
  will see the warning and can correct the env value rather than
  wondering why their concurrency tuning had no effect.

- New unit test for the `--source` + no-git-remote bail in
  `secrets::resolve_for_command`: previously the helpful "no Azure
  DevOps git remote was detected; try --all-repos" error path was
  untested. Now asserted via a `tokio::test` that constructs an empty
  AdoContext and verifies the error message contains both the cause
  and the suggested mitigation.

All 1753 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
jamesadevine added a commit that referenced this pull request May 19, 2026
…#624)

Round-5 PR review cleanup. No bugs flagged — three maintainability
suggestions, all addressed:

- **F1: reuse `crate::sanitize::neutralize_pipeline_commands`.** The
  codebase already has a canonical pipeline-command neutraliser at
  `src/sanitize.rs:145` (used by the `SanitizeContent` /
  `SanitizeConfig` pipelines for safe outputs and front-matter
  sanitisation). The round-4 fix introduced two private
  `sanitize_for_vso_logging` helpers — one in
  `src/compile/extensions/ado_aw_marker.rs`, one in
  `src/ado/discovery.rs` — that hard-coded the
  `[vso-filtered][` / `[filtered][` form copied from
  `agent_stats::sanitize_for_markdown`. Both private copies removed in
  favour of the canonical helper. The canonical helper uses
  backtick-wrapping (`` `##vso[` `` / `` `##[` ``) which equally
  defeats the ADO agent's stdout scanner; the threat model is
  unchanged. Tests in both files updated to assert the canonical
  output. The two now-redundant unit tests for the local helpers are
  removed; their behavioural coverage already lives in
  `src/sanitize.rs:570-605`.

- **F2: two-pass classify-then-filter in
  `resolve_definitions_via_discovery`.** The previous shape mutated
  four counter variables as a side-effect inside a `.filter()` closure
  that *also* decided inclusion — the local `kept` variable then held
  items that would be dropped later by `discovered_to_matched`,
  misleading any reader. Rewritten as an explicit `for` loop that
  classifies + counts in pass 1, emits warnings + converts to
  `MatchedDefinition` in pass 2. Renamed `kept` to `selected` to
  match what the variable actually holds.

- **F3: comment on the non-`.md` fallthrough in `is_direct_match`.**
  Explains that the unreachable-today branch is a forward-compat
  measure for hypothetical `.yaml` / `.json` agent sources and that
  the conservative behaviour (treat as `Consumer`, not `Direct`)
  keeps write commands acting on the definition while labelling it
  honestly in the summary.

Net test delta: −2 (the two duplicate sanitizer unit tests). 1751
tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid implementation — the architecture is well-thought-out and the security mitigations are thorough. Two correctness issues worth fixing before merge.


Findings

🐛 Bugs / Logic Issues

1. normalize_source_path double-escapes " in JSON marker (src/compile/extensions/ado_aw_marker.rs:62–75)

normalize_source_path replaces " with \" for safe embedding in the YAML header comment (source="..."). But when the already-escaped string is then passed to serde_json::json!, serde serializes the backslash again, writing "source":"agents/foo\\\"bar.md" in the JSON rather than the correct "agents/foo\"bar.md".

// ado_aw_marker.rs
let source = super::super::common::normalize_source_path(input_path);
// ^ source = r#"agents/foo\"bar.md"# if path contained "
let metadata_json = serde_json::json!({
    "source": source,   // ← serde_json double-escapes the \
})
.to_string();

The round-trip still works in practice (the --source CLI path also goes through normalize_source_path, so both sides carry the extra backslash), but the JSON stored in every compiled lock file is semantically wrong. Fix: derive a separate forward-slash-only helper (no quote escaping) for the JSON metadata, and keep normalize_source_path for the YAML header comment only.


2. is_direct_match false-positive for same-stem paths in different directories (src/ado/discovery.rs:476–500)

yaml_normalized == stem || yaml_normalized.ends_with(&format!("/{stem}"))

stem for source = "agents/foo.md" is "agents/foo.lock.yml". The ends_with branch matches a definition whose yamlFilename is other/agents/foo.lock.yml — a completely unrelated pipeline with the same lock-file name in a different folder. The consequence is a Direct label on what is actually a Consumer (or unrelated) pipeline.

Low operational impact (write commands still act on the definition), but the mislabelling could confuse users reviewing --dry-run output. Consider requiring yaml_normalized == stem or a full-suffix check that includes the path separator on the left side too:

yaml_normalized == stem
    || yaml_normalized == format!("agents/{stem}")  // explicit repo-root prefix
    || yaml_normalized.ends_with(&format!("/{stem}"))
        && marker.source.contains('/')  // only accept when source itself is under a directory

Or use a stricter policy: only classify Direct via the local-lock fast path (which uses the exact yamlFilename from the definition), and classify all Preview-path matches as Consumer without trying to distinguish.


⚠️ Suggestions

3. --source matching is case-sensitive but undocumented (src/ado/discovery.rs:652)

d.markers.iter().any(|m| m.source == src)

On case-insensitive file systems (Windows), a user passing --source Agents/Foo.md would silently get zero results because the marker stores agents/foo.md. Either document this in the CLI help string, or normalise to lowercase before comparison (which normalize_source_path currently doesn't do).


4. revision field is stored but never read; the doc comment warns "no caching implemented yet" (src/ado/mod.rs:232–238)

The field is purely speculative infrastructure. No concern for this PR, just flagging so a follow-up #[allow(dead_code)] or // TODO doesn't get forgotten.


✅ What Looks Good

  • Security mitigations are thorough: neutralize_pipeline_commands on the echoed source path, bash_single_quote_escape for the single-quoted bash string, and \n/\r stripping all being applied consistently. The unit test for ##vso[ injection directly validates the guard.
  • Error handling follows project conventions: anyhow::bail! with actionable messages, .context() on HTTP calls, no unwrap() on user-facing paths outside of tests.
  • The ADO_AW_PREVIEW_CONCURRENCY=0 deadlock guard (clamp to 1 with an explicit warn!) is a thoughtful safety catch that most implementations miss.
  • The build_lock_path_map / normalize_ado_yaml_path key alignment is correct — both replace('\\', "/").trim_start_matches('/') — so the fast-path lookup works.
  • Forward-compat #[serde(default)] on MarkerMetadata and skipping malformed JSON on parse_marker_step are exactly right for a schema that's expected to grow.
  • The source_without_all_repos_bails_when_no_git_remote test is a good guard against the most likely user footgun.

Generated by Rust PR Reviewer for issue #624 · ● 1.8M ·

jamesadevine added a commit that referenced this pull request May 19, 2026
Three concrete issues from the latest review pass:

1. `normalize_source_path` double-escaped `"` in the JSON marker.
   The helper escaped `"` -> `\"` before the path reached
   `serde_json::json!`, which then escaped the backslash again. The
   marker stored `"source":"agents/foo\\\"bar.md"` instead of the
   canonical `"source":"agents/foo\"bar.md"` — every round-trip
   carried a spurious backslash. Move the `"` -> `\"` escape into
   `generate_header_comment` (the only YAML-comment surface that
   needs it) and leave `normalize_source_path` returning the
   canonical form for the JSON marker and the `--source` filter.

2. `is_direct_match` false-positive for same-stem in different
   directories. The previous `yaml_normalized.ends_with("/{stem}")`
   branch would label an unrelated pipeline `Direct` when its
   `yamlFilename` happened to share the same trailing
   `<stem>.lock.yml` (e.g. marker `agents/foo.md` matched
   `other/agents/foo.lock.yml`). Both sides are already normalised
   to repo-root-relative paths without a leading slash, so strict
   equality is the correct check. Updated test name and added a
   regression test for the same-stem-different-directory case.

3. Documented case-sensitivity of `--source` in CLI help and the
   discovery module's doc comment so Windows users don't get silent
   zero-result runs from a case mismatch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good — well-designed feature with thorough tests and solid security thinking. A few minor issues worth addressing before merging.


Findings

🐛 Bugs / Logic Issues

  • src/ado/discovery.rs:build_lock_path_map./-prefixed paths silently miss the fast-path

    build_lock_path_map strips leading / via trim_start_matches('/') but not leading ./. If detect_pipelines ever returns a path starting with ./ (e.g. ./agents/foo.lock.yml from a PWD-relative scan), the map key would be ./agents/foo.lock.yml but normalize_ado_yaml_path on the ADO yaml_filename side produces agents/foo.lock.yml. The fast-path lookup silently misses, and falls back to a Preview call. Not a correctness bug (fallback is fine), but degrades the intended optimization. The existing test returns PathBuf::from("pipeline.yml") so this doesn't bite in practice today, but it's fragile. Fix: add trim_start_matches("./") after the / trim, mirroring normalize_source_path.

  • src/ado/discovery.rs:resolve_definitions_via_discovery--source /abs/path.md silently matches nothing

    normalize_source_path strips ./ but not a leading /. A user passing --source /agents/foo.md (absolute path form) gets normalized_filter = Some("/agents/foo.md"), which never equals a marker's agents/foo.md. The CLI docs say to pass relative paths, but a silent no-match is worse than an error. Adding a leading-/ strip to normalize_source_path (or at the --source input site) would make this robust.

⚠️ Suggestions

  • src/ado/discovery.rs:classify_definitionexpect() in spawned task

    let _permit = sem.acquire_owned().await.expect("semaphore not closed");

    The semaphore is kept alive via Arc so this won't fire in practice, but a panic here propagates as a JoinError caught by Err(e) => warn!("Discovery worker panicked: {e}"), silently dropping that definition from results. The expect message is already descriptive; consider unwrap_or_else(|_| unreachable!("semaphore behind Arc cannot be closed")) or a comment making the invariant explicit. Very minor.

  • src/ado/discovery.rs:is_direct_match — non-.md sources silently become Consumer

    The comment documents this as intentional ("conservative fallback"). Fine as-is, but worth a debug-log line so operators who accidentally compile from a .yaml source don't wonder why every definition shows up as Consumer.

  • src/compile/extensions/mod.rsAdoAwMarkerExtension in ExtensionPhase::Tool

    The marker has no tool semantics. ExtensionPhase::Setup would be a more accurate grouping and would naturally place the marker step before GitHub/SafeOutputs setup-phase steps in emit order. The current placement works because collect_extensions pushes AdoAwMarker first, but the ordering is then an implicit dependency on insertion order rather than phase.

✅ What Looks Good

  • Security: The ##vso[ injection mitigation is thorough — both the echo line in the generated step (neutralize_pipeline_commands) and the yaml_path field in discovered_to_matched are sanitized. The distinction between why the JSON marker line doesn't need it (comments inside step bodies bypass the ADO logging-command scanner) is correctly documented and tested.
  • normalize_source_path refactor: Removing " escaping from the shared helper and letting serde_json::json! handle JSON encoding is the right fix. The round-trip regression test (json_marker_quote_in_source_round_trips_correctly) catches the double-escape bug cleanly.
  • normalize_repo_url: Percent-decode + ASCII-lowercase handles ADO's mixed-encoding URL forms correctly, and the explanation in the comment makes the non-obvious necessity clear.
  • Error classification: PreviewErrorDiscoveryStatus mapping is clean, and the distinct NotFound status for TOCTOU races avoids spurious "Preview Failed" warnings for definitions that just got deleted.
  • ADO_AW_PREVIEW_CONCURRENCY=0 handling: Warn + clamp rather than silent deadlock avoidance. The .max(1) on the semaphore is redundant but harmless as a belt-and-suspenders guard.
  • is_direct_match strict equality: The comment explaining why the earlier ends_with tail-match produced false-positives is exactly the kind of context that prevents the regression from coming back.
  • Test coverage: 6 unit tests for parse_marker_step, 12 for scope/classification, 4 integration tests for per-target marker emission — comprehensive for the surface area changed.

Generated by Rust PR Reviewer for issue #624 · ● 1.9M ·

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — well-designed feature with thoughtful security treatment, solid test coverage, and good documentation of migration trade-offs. A few items worth a second look.

Findings

⚠️ Suggestions

  • src/ado/discovery.rs:732-743 — Skip-warning counters include non-ado-aw definitions in --all-repos mode.
    The skip counters (skipped_required_params, skipped_forbidden, skipped_failed) are incremented before the matches_filter check, so they count every failed definition in the project — including pipelines that aren't ado-aw at all (they'll show up as failed previews). In a large project with hundreds of non-ado-aw 400/403 responses, the warning "Discovery skipped N definitions whose Pipeline Preview requires templateParameters..." could be quite noisy and misleading. Consider only counting failures for definitions that would have been ado-aw candidates, or noting in the warning text that "N could include non-ado-aw pipelines."

  • src/compile/extensions/mod.rs:130CompileContext::new silently uses Path::new("") when input_path has no parent.
    input_path.parent() for a bare filename "foo.md" returns Some(Path::new("")), not None, so the unwrap_or(Path::new(".")) fallback is never reached in that case. Path::new("") and Path::new(".") behave differently when running git -C "" vs git -C . in infer_ado_context. This is pre-existing (callers previously did the same parent().unwrap_or(".") themselves), but worth a comment noting the Some("") behaviour since it's now hidden inside the function.

  • src/ado/discovery.rs:534marker_origin_matches when current_org resolves to "".
    In resolve_definitions_via_discovery, if ctx.org_name() returns None (org_url has no path segment), current_org becomes "". A strict marker (non-empty org/repo) would then fail the eq_ignore_ascii_case("") check and be silently excluded. In practice this requires --source --all-repos with a malformed/missing ADO context (which should also fail auth), but there's no test for this case and no error surfaced to the user. At minimum a debug! log when current_org is empty while a source filter is active would help diagnose silent mismatches.

  • src/compile/extensions/ado_aw_marker.rsnormalize_source_path with absolute input paths.
    If a user invokes ado-aw compile /absolute/path/to/agents/foo.md, the marker stores the full absolute path (/absolute/path/to/agents/foo.md). A --source agents/foo.md filter would then never match. normalize_source_path strips ./ but not absolute path components. This is pre-existing behaviour (not a regression), but worth a doc note or validation in the --source handler since the mismatch is silent.

✅ What Looks Good

  • Security sanitization is thorough and correct. The ##vso[ neutralisation in the echo line, the bash_single_quote_escape idiom, and the discovered_to_matched yaml_path sanitisation are all well-motivated with tests that verify the injection defence. The analysis in the comments that the JSON # ado-aw-meta(redacted) comment is never sent to stdout by bash (only the echo` line is) is accurate.
  • The normalize_source_path double-escaping fix is a real correctness fix. Feeding the raw path into serde_json::json! and letting serde_json handle JSON-level escaping (rather than pre-escaping " and getting the backslash double-encoded) is the right approach, and the round-trip test json_marker_quote_in_source_round_trips_correctly pins it down nicely.
  • is_direct_match equality-over-suffix fix. The regression test consumer_when_same_stem_in_different_directory captures exactly the case that was broken and clearly explains why ends_with was wrong — good defensive documentation.
  • 1ES generate_setup_job now routes extension setup_steps — previously the only compile target that didn't, meaning the always-on marker would have been silently absent in 1ES-compiled pipelines without this fix.
  • Test coverage is comprehensive. Unit tests for parse_marker_step, is_direct_match, apply_scope_filter, marker_origin_matches, normalize_repo_url, and discovered_to_matched cover all the meaningful edge cases. The integration tests asserting exactly one marker per compiled target are the right way to regression-pin the extension wiring.

Generated by Rust PR Reviewer for issue #624 · ● 1.3M ·

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good — well-designed feature with strong test coverage. A few items worth addressing before merge.

Findings

🐛 Bugs / Logic Issues

  • src/ado/discovery.rs:build_lock_path_map vs normalize_ado_yaml_path divergence riskbuild_lock_path_map builds keys by hand (replace('\\', "/") + trim_start_matches('/')) while the lookup side uses super::normalize_ado_yaml_path. These are semantically equivalent today but will silently diverge if normalize_ado_yaml_path ever gains additional canonicalisation (e.g. lowercasing or URL-decoding). Using the shared helper for both sides would make this future-proof:

    let key = super::normalize_ado_yaml_path(&normalized.trim_start_matches('/'));
    map.insert(key, path.clone());
  • src/ado/discovery.rs:resolve_definitions_via_discovery — uninspectable definitions land in selected for --all-repos alone — When normalized_filter is None, matches_filter is true for every definition including UnknownRequiredParams, UnknownForbidden, and PreviewFailed ones. They get pushed into selected and then discarded by discovered_to_matched. For a large project this bloats the intermediate vec. Not a correctness bug (the counters and final output are both correct), but an easy win: push only when both matches_filter and the status is actionable, or push first and let discovered_to_matched do the dual duty it already does. Either way the counts must still be tallied before the push guard.

⚠️ Suggestions

  • src/compile/extensions/ado_aw_marker.rs — missing trailing newline — The file ends with } without a final newline (\ No newline at end of file in the diff). cargo fmt will flag this; easy fix.

  • src/ado/discovery.rs:is_direct_match — heuristic boundary worth a commentmarkers.len() > 1 as a "Consumer" signal is correct for the current architecture (a Direct pipeline's expanded YAML has exactly one marker from its own Setup job). But the function doc doesn't explain why > 1 implies Consumer. A one-liner would help future readers who wonder whether this was a deliberate design or an arbitrary threshold:

    // A compiled ado-aw pipeline's expanded YAML carries exactly one marker
    // (its own Setup-job step). More than one means the YAML was produced by
    // expanding a consumer that includes multiple templates — none of which
    // are this definition's "own" root YAML.
    if markers.len() > 1 {
        return false;
    }
  • src/ado/discovery.rs:resolve_definitions_via_discovery--source without --all-repos missing hint on empty match — When the discovery path returns Ok(Some([])) for a --source filter scoped to CurrentRepo, the empty-match bail in run_set/run_list/run_delete says "No ado-aw pipelines found via Preview-driven discovery." but doesn't tell the user whether the template simply has no consumers in this repo or whether they forgot --all-repos. A small hint like "...try --all-repos to search the entire project" would match the quality of the other error messages.

✅ What Looks Good

  • Double-escape regression fix: removing the replace('"', "\\\"") from normalize_source_path and letting serde_json::json! handle JSON encoding is the correct approach — the old path would have stored \" in the JSON which serde then re-escaped to \\". The regression test (json_marker_quote_in_source_round_trips_correctly) pins this cleanly.
  • ##vso[ injection defence in depth: the echo line goes through neutralize_pipeline_commands and bash_single_quote_escape, with a targeted test for each threat. The discovered_to_matched neutralization on the yaml_path field adds another layer for stdout surfaces.
  • Semaphore env-tunability with deadlock-avoidance warning: clamping ADO_AW_PREVIEW_CONCURRENCY=0 to 1 with a warn! (rather than silently using 0 and deadlocking) is the right UX choice.
  • MarkerMetadata forward-compat design: serde(default) on all fields + skip-on-malformed-JSON in parse_marker_step means future schema additions won't break existing callers. The test_parse_marker_step_tolerates_extra_json_fields test correctly pins this.
  • CompileContext::new parent-dir fix: the Some(p) if !p.as_os_str().is_empty() guard for bare filenames returning "" from Path::parent() is subtler than the old unwrap_or(Path::new(".")) and the comment explaining why is valuable.
  • Test coverage: 12 unit tests in discovery.rs, 6 in detect.rs, 4 integration tests across all four compile targets, and two async precondition tests in secrets.rs — comprehensive for the surface area changed.

Generated by Rust PR Reviewer for issue #624 · ● 721.4K ·

jamesadevine and others added 10 commits May 19, 2026 14:24
…overy

Adds project-scope token management via two new flags on `secrets set`,
`secrets list`, and `secrets delete`:

- `--all-repos` — operate on every ado-aw pipeline ADO knows about in
  the project (direct ado-aw definitions *and* consumer pipelines that
  include ado-aw templates), regardless of which repo their root YAML
  lives in.
- `--source <path>` — filter to consumers of one specific template.

Both flags activate a new Preview-driven discovery path that calls
`POST /_apis/pipelines/{id}/preview` per definition and scans the
expanded YAML for an `# ado-aw-metadata: {…}` JSON marker. The legacy
lexical local-fixture matcher remains the default; `--definition-ids`
remains the explicit-ID escape hatch.

To make discovery work, every compiled pipeline now carries a marker
via a new always-on `AdoAwMarkerExtension`. The marker lives inside a
bash Setup-job step because ADO's Preview API strips top-of-document
comments during YAML expansion (verified empirically against live def
2434 in `msazuresphere/4x4`) but preserves comments inside step
bodies. Uniform across all four targets (standalone / 1es / job /
stage); no per-target placement special-casing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- `is_direct_match`: return `false` (not `markers.is_empty()`) for the
  marker-less case. The previous code returned `true` for 0 markers,
  which would misclassify a non-ado-aw definition as `Direct` if any
  future caller hit that path. Belt-and-braces — the current callsite
  in `classify_definition` already guards against it.

- `discovered_to_matched`: doc-comment claimed `UnknownRequiredParams`
  was propagated; implementation drops it. Keep the safer drop
  behaviour (we can't act on a marker-less definition) but surface a
  `warn!` summary from `resolve_definitions_via_discovery` so
  `secrets set --all-repos` operators can see when pipelines were
  skipped because of required-template-parameters / 403 / other
  preview failures. Doc comment updated to match.

- `AdoContext::repo_url`: percent-encode `project` and `repo_name` so
  `DiscoveryScope::CurrentRepo` works for projects whose names contain
  reserved characters (e.g. spaces). The lowercase normalize step
  can't reconcile a decoded local name with the encoded form ADO
  returns in `repository.url`.

- `AdoAwMarkerExtension`: bash-quote-escape the source path embedded
  in the runtime `echo` line. A markdown filename containing `'`
  (e.g. `agents/foo's.md`) would otherwise produce syntactically
  broken bash. New `bash_single_quote_escape` helper applies the
  canonical `'\''` idiom; the JSON marker line keeps the raw value
  because JSON has no quoting concern with `'`. Two new tests cover
  the idiom and a `foo's-agent.md` path.

- `src/detect.rs`: drop the now-stale `#[allow(dead_code)]` attrs on
  `MARKER_STEP_PREFIX`, `MarkerMetadata`, and `parse_marker_step`.
  All three are actively consumed by `src/ado/discovery.rs`.

- `resolve_for_command`: thread local-lock-file paths into discovery
  so the `process.yamlFilename` fast-path can skip Preview calls for
  locally-compiled pipelines. Best-effort scan — failures fall back
  to Preview-for-everything cleanly.

All 1741 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- `AdoAwMarkerExtension`: neutralise `##vso[` and `##[` logging-command
  prefixes in the source path before embedding it in the marker step's
  runtime `echo` line. Without this, a markdown filename like
  `agents/##vso[task.setvariable variable=FOO]value.md` would echo a
  literal `##vso[...]` sequence that the ADO build agent's stdout
  scanner treats as a task command — a logging-command-injection
  primitive any attacker controlling a filename could trigger. New
  `sanitize_for_vso_logging` helper mirrors the existing convention in
  `crate::agent_stats::sanitize_for_markdown` (`[vso-filtered][` /
  `[filtered][`). The `# ado-aw-metadata:` JSON line keeps the raw
  value (it's a YAML comment, not echoed to stdout). Two new tests:
  the sanitiser unit test and an end-to-end attack-payload roundtrip
  asserting the echo line is neutralised.

- `resolve_definitions_via_discovery`: the previous skip-counter
  implementation counted `UnknownRequiredParams` / `Forbidden` /
  `PreviewFailed` failures *before* applying `source_filter`, so under
  `--source agents/foo.md` the warnings would tell the user "N
  definitions skipped requiring template parameters" for definitions
  that had nothing to do with `agents/foo.md`. Split the counting:

  * without `--source`: per-status counts are honest (we're operating
    on every ado-aw pipeline) and the existing three warnings stand;
  * with `--source`: a single conservative `uninspectable` counter,
    surfaced as one warning that explicitly acknowledges we can't tell
    whether any of those skipped definitions would have been consumers
    of the filtered template.

- `src/ado/discovery.rs`: drop the file-level `#![allow(dead_code)]`.
  `resolve_definitions_via_discovery` and `discovered_to_matched` are
  now wired into `secrets.rs`; the suppression was hiding future
  dead-code regressions. Build is clean without it.

- `src/main.rs` (`SecretsCmd`): clarified `--source` help text — calls
  out that **without `--all-repos`, only the current repository is
  searched**. Saves the user-confusion case "I ran `secrets set
  GITHUB_TOKEN --source agents/foo.md` and got zero results" when
  they're in a different repo than the consumer pipelines.

All 1743 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- `resolve_definitions_via_discovery`: normalize the user-supplied
  `--source` value through `normalize_source_path` before comparing
  against `marker.source`. The marker stores the canonical form
  (`agents/foo.md`), so without normalization a user typing
  `--source ./agents/foo.md` or `--source agents\foo.md` (Windows)
  silently matched nothing. Re-export `normalize_source_path` from
  `crate::compile` so callers outside the compile module tree can
  reach it cleanly. New test asserts the three common variants
  (canonical, leading-`./`, backslash) all produce the same normalized
  string.

- `classify_definition` / `DiscoveryStatus::NotFound`: 404 from the
  Preview endpoint almost certainly means the definition was deleted
  between `list_definitions` and `preview_pipeline` (TOCTOU race).
  Previously routed through `PreviewFailed`, which inflated
  `skipped_failed` counts and confused operators. New
  `DiscoveryStatus::NotFound` variant is excluded from skip-warning
  counters in `resolve_definitions_via_discovery` and dropped by
  `discovered_to_matched` like the other non-actionable statuses.
  Debug-logged with the definition id+name so `--debug` users can
  still see what happened.

- `DefinitionSummary::revision`: doc comment claimed Preview-driven
  discovery uses it as a cache key, but no caching is implemented.
  Rewrote to say it's deserialised for a future cache and there is
  *no* caching yet, with a clear "see the discovery module for current
  behaviour" pointer.

- `DiscoveryScope::Explicit`: clarified the docstring to call out
  that no production callsite constructs this variant — `--definition-ids`
  uses the legacy `resolve_definitions` path before discovery ever
  runs. Variant is kept (not removed and not `#[cfg(test)]`-gated)
  because direct API consumers may want to feed pre-filtered IDs into
  discovery; the existing unit-test construction stays.

- `secrets::resolve_for_command`: bail early with a targeted error
  when `--source` is used without `--all-repos` outside a recognised
  ADO repo. The previous behaviour was a generic "No ado-aw pipelines
  found via Preview-driven discovery" message that gave no hint that
  the empty result was caused by the missing git remote. New error
  spells out the cause and suggests `--all-repos`.

All 1746 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- `normalize_repo_url`: percent-decode before comparing, so a project
  named "My Project" matches whether ADO returns the encoded form
  (`My%20Project`) or the decoded form. The previous implementation
  assumed ADO always returns percent-encoded URLs; that assumption is
  documented in code now and the comparison is encoding-independent.
  New unit tests cover the encoded/decoded equivalence and the
  case-insensitive/trailing-slash behaviour.

- `discovered_to_matched`: stop silently truncating consumers that
  include multiple ado-aw templates. The `yaml_path` field used by
  `print_matched_summary` now joins every marker source with `, ` so
  e.g. `agents/a.md, agents/b.md` shows up honestly in the CLI
  summary. New unit test asserts both sources are surfaced.

- `##vso[` defence-in-depth: the marker step's runtime echo already
  neutralises `##vso[` and `##[` prefixes, but the same raw source
  string was flowing through `MarkerMetadata` -> `MatchedDefinition::yaml_path`
  -> `print_matched_summary` (which writes to stdout). When the CLI
  is invoked from inside an ADO pipeline step, the agent's stdout
  scanner would still pick up an attacker-controlled `##vso[...]`
  payload. New `sanitize_for_vso_logging` helper in the discovery
  module applies the same convention (`##vso[` -> `[vso-filtered][`,
  `##[` -> `[filtered][`) when building the `yaml_path`. New unit test
  asserts the sanitisation.

- `ADO_AW_PREVIEW_CONCURRENCY=0` now emits a `warn!` before clamping
  to 1, instead of silently masking the typo. Operators who set `=0`
  will see the warning and can correct the env value rather than
  wondering why their concurrency tuning had no effect.

- New unit test for the `--source` + no-git-remote bail in
  `secrets::resolve_for_command`: previously the helpful "no Azure
  DevOps git remote was detected; try --all-repos" error path was
  untested. Now asserted via a `tokio::test` that constructs an empty
  AdoContext and verifies the error message contains both the cause
  and the suggested mitigation.

All 1753 tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…#624)

Round-5 PR review cleanup. No bugs flagged — three maintainability
suggestions, all addressed:

- **F1: reuse `crate::sanitize::neutralize_pipeline_commands`.** The
  codebase already has a canonical pipeline-command neutraliser at
  `src/sanitize.rs:145` (used by the `SanitizeContent` /
  `SanitizeConfig` pipelines for safe outputs and front-matter
  sanitisation). The round-4 fix introduced two private
  `sanitize_for_vso_logging` helpers — one in
  `src/compile/extensions/ado_aw_marker.rs`, one in
  `src/ado/discovery.rs` — that hard-coded the
  `[vso-filtered][` / `[filtered][` form copied from
  `agent_stats::sanitize_for_markdown`. Both private copies removed in
  favour of the canonical helper. The canonical helper uses
  backtick-wrapping (`` `##vso[` `` / `` `##[` ``) which equally
  defeats the ADO agent's stdout scanner; the threat model is
  unchanged. Tests in both files updated to assert the canonical
  output. The two now-redundant unit tests for the local helpers are
  removed; their behavioural coverage already lives in
  `src/sanitize.rs:570-605`.

- **F2: two-pass classify-then-filter in
  `resolve_definitions_via_discovery`.** The previous shape mutated
  four counter variables as a side-effect inside a `.filter()` closure
  that *also* decided inclusion — the local `kept` variable then held
  items that would be dropped later by `discovered_to_matched`,
  misleading any reader. Rewritten as an explicit `for` loop that
  classifies + counts in pass 1, emits warnings + converts to
  `MatchedDefinition` in pass 2. Renamed `kept` to `selected` to
  match what the variable actually holds.

- **F3: comment on the non-`.md` fallthrough in `is_direct_match`.**
  Explains that the unreachable-today branch is a forward-compat
  measure for hypothetical `.yaml` / `.json` agent sources and that
  the conservative behaviour (treat as `Consumer`, not `Direct`)
  keeps write commands acting on the definition while labelling it
  honestly in the summary.

Net test delta: −2 (the two duplicate sanitizer unit tests). 1751
tests pass; clippy clean on touched files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three concrete issues from the latest review pass:

1. `normalize_source_path` double-escaped `"` in the JSON marker.
   The helper escaped `"` -> `\"` before the path reached
   `serde_json::json!`, which then escaped the backslash again. The
   marker stored `"source":"agents/foo\\\"bar.md"` instead of the
   canonical `"source":"agents/foo\"bar.md"` — every round-trip
   carried a spurious backslash. Move the `"` -> `\"` escape into
   `generate_header_comment` (the only YAML-comment surface that
   needs it) and leave `normalize_source_path` returning the
   canonical form for the JSON marker and the `--source` filter.

2. `is_direct_match` false-positive for same-stem in different
   directories. The previous `yaml_normalized.ends_with("/{stem}")`
   branch would label an unrelated pipeline `Direct` when its
   `yamlFilename` happened to share the same trailing
   `<stem>.lock.yml` (e.g. marker `agents/foo.md` matched
   `other/agents/foo.lock.yml`). Both sides are already normalised
   to repo-root-relative paths without a leading slash, so strict
   equality is the correct check. Updated test name and added a
   regression test for the same-stem-different-directory case.

3. Documented case-sensitivity of `--source` in CLI help and the
   discovery module's doc comment so Windows users don't get silent
   zero-result runs from a case mismatch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When two repos in the same ADO project both define a file of the
same name (e.g. each has its own `agents/foo.md`), today's
`--source agents/foo.md` would silently match consumers of either
template — the marker JSON carried only `source` and operators had
no way to scope writes to "my" agents/foo.md.

Add `org` and `repo` to the marker payload:

* `src/detect.rs` — `MarkerMetadata` gains two `#[serde(default)]`
  `String` fields; struct now derives `Default` so test fixtures
  can use the `..Default::default()` spread.
* `src/compile/extensions/ado_aw_marker.rs` — pull org from
  `ctx.ado_org()` and repo from `ctx.ado_context.repo_name`,
  lower-cased (ADO identifiers are case-insensitive). Empty when the
  compiler ran outside an ADO checkout; the non-GitHub-remote guard
  ensures this is rare in production. Echo line gains
  `org=... repo=...` for build-log readability.
* `src/ado/mod.rs` — add `AdoContext::org_name()` accessor that
  mirrors `CompileContext::ado_org`; lives on the struct so
  non-compile callers (Preview-driven discovery) can reuse it.
* `src/ado/discovery.rs` — when `source_filter` is active,
  `resolve_definitions_via_discovery` also requires the marker's
  `(org, repo)` to equal the current `ctx`'s. Markers with empty
  `org` and `repo` (legacy / non-ADO compiles) match leniently
  so already-deployed pipelines remain discoverable until they are
  recompiled. Helper `marker_origin_matches` extracted with four
  dedicated unit tests covering strict, case-insensitive, lenient,
  and half-populated cases.

Schema version stays at 1 — this PR has not shipped, so there are
no v1-without-org/repo markers in the wild that would need the
bump for forward-compat detection. New fields parse fine on either
side via `#[serde(default)]`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. `src/ado/discovery.rs:732` — skip-warning text overclaimed
   ado-aw membership.

   In `--all-repos` mode the per-status counters
   (`skipped_required_params` / `_forbidden` / `_failed`) fired
   for every failed Preview, including pipelines that aren't ado-aw
   at all. A project with hundreds of non-ado-aw pipelines that
   legitimately require `templateParameters` would emit
   "Discovery skipped N definitions whose Pipeline Preview requires
   templateParameters" when none of them were ado-aw consumers in
   the first place. We literally cannot tell which is which without
   successful Preview output.

   Consolidated the three separate `warn!` messages into one that's
   honest about uncertainty ("any ado-aw pipelines among them have
   been silently skipped"), with the per-status breakdown demoted to
   `debug!` for operators investigating. Counter variables renamed
   (`uninspectable_*`) to reflect what they actually count.

2. `src/compile/extensions/mod.rs:130` — `Path::parent()` returns
   `Some(Path::new(""))` for bare filenames, not `None`, so the
   `unwrap_or(Path::new("."))` fallback never fired for inputs like
   `foo.md`. Empty paths passed to `git -C "" remote get-url`
   behave differently from `git -C "."`. Match-arm now normalises
   both the `None` and empty-`Some` cases to `.`.

3. `src/secrets.rs:202` + `src/ado/discovery.rs` —
   `--all-repos --source` could silently exclude every strict
   marker when `ctx.org_name()` returned `None` or
   `ctx.repo_name` was empty. The existing guard only fired in the
   `!all_repos` branch. Extended the guard to fire for any
   `--source` invocation when the current `(org, repo)` is
   unresolvable, with an actionable error pointing the user to
   `--definition-ids` as the escape hatch. New test:
   `source_with_all_repos_bails_when_org_url_unresolvable`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The always-on `ado-aw-marker` extension was emitting its metadata
step via `setup_steps`, which forced `generate_setup_job` to spin
up a dedicated `- job: Setup` block in every compiled pipeline.
Minimal agentic pipelines with no user `setup:` declarations, no
PR/pipeline filters, and no other Setup-contributing extensions were
paying for a whole extra pool agent + agent boot just to emit a
single-line metadata comment.

Move the marker emission to `prepare_steps` so the step lands
inside the always-present Agent job's `{{ prepare_steps }}` block.
No extra job, no extra pool time — the marker is still parsed by
`parse_marker_step` the same way (scans for the
`# ado-aw-metadata:` line anywhere in the YAML).

The change requires extending the `CompilerExtension::prepare_steps`
signature to accept `&CompileContext` — the marker needs
`input_path`, `ado_context`, and `front_matter.target` to
build its JSON. All five existing `prepare_steps` implementors
(cache_memory, lean, python, node, dotnet) gain an unused
`_ctx: &CompileContext` parameter. The macro dispatcher and
`generate_prepare_steps` propagate ctx through.

Tests:

* All five existing `prepare_steps` unit tests in `extensions/tests.rs`
  updated to construct a `CompileContext` via the existing `ctx_from`
  helper.
* All seven `generate_prepare_steps` tests in `common.rs` updated to
  pass a `CompileContext::for_test(&fm)`.
* All seven marker-emit unit tests in `ado_aw_marker.rs` updated to
  call `prepare_steps(&ctx)` (now returns `Vec<String>` directly,
  no `Result` wrapper).
* New regression test `test_marker_does_not_create_setup_job_for_minimal_pipeline`
  asserts a minimal compiled pipeline contains the marker line but
  no `- job: Setup` block — guards against future re-introduction
  of the unnecessary Setup job.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. `src/ado/discovery.rs::build_lock_path_map` — was hand-rolling the
   `\\` → `/` + leading-slash-strip normalisation while the lookup
   side used `super::normalize_ado_yaml_path`. Functionally
   equivalent today, but the two would silently diverge if
   `normalize_ado_yaml_path` ever gained case-folding or
   URL-decoding. Route both sides through the shared helper.

2. `src/ado/discovery.rs::resolve_definitions_via_discovery` — in
   `--all-repos` mode (no source filter), uninspectable
   definitions (`UnknownRequiredParams` / `UnknownForbidden` /
   `PreviewFailed`) and `NotAdoAw` / `NotFound` defs were
   ending up in the `selected` vec only to be discarded by
   `discovered_to_matched`. In a large project where most
   definitions are non-ado-aw, this could allocate a vec of
   hundreds of dead entries. Push only actionable statuses
   (`Direct` / `Consumer`) into `selected`. Counts are still
   tallied before the guard so the warning text is unchanged.

3. `src/compile/extensions/ado_aw_marker.rs` — added a trailing
   newline (was failing `cargo fmt --check`).

4. `src/ado/discovery.rs::is_direct_match` — expanded the
   `markers.len() > 1` comment to explain *why* multiple markers
   imply Consumer: a compiled ado-aw pipeline's expanded YAML carries
   exactly one marker (its own Setup-job step); more than one means
   the YAML was produced by expanding a consumer that
   `template:`-includes multiple ado-aw lock files.

5. `src/secrets.rs` — empty-match hints used to lump all three
   flag combinations into one of two messages. Factored an
   `empty_match_hint(all_repos, source)` helper that branches on
   the four `(bool, Option)` cases, so `--source agents/foo.md`
   on its own now points the user at `--all-repos` when nothing
   matches in the current repo. All three callers (`run_set` /
   `run_list` / `run_delete`) consolidated onto the helper.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesadevine jamesadevine force-pushed the devinejames/secret-setting branch from 54228e7 to 9b5a232 Compare May 19, 2026 13:27
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good — solid implementation with thorough tests. A few minor items below, none blocking.


Findings

⚠️ Suggestions

  • ado_aw_marker.rs — PR description mischaracterizes the hook used.
    The description and PR body say "via the existing CompilerExtension::setup_steps hook" and "Setup-job bash step", but the actual implementation correctly uses fn prepare_steps(...) (Agent job prepare phase). The code comment explains the deliberate choice: a setup job would spin up an extra pool agent just for a metadata comment. The mismatch only matters for future contributors reading the description, but AGENTS.md carries a similar "Setup-job" description in its stored-memory block — worth a one-liner correction there to avoid confusion.

  • src/ado/discovery.rsDiscoveryScope::Explicit is unreachable dead code.
    The variant is documented "Reserved for future use", but no production callsite constructs it, and --definition-ids short-circuits before discovery is invoked. Since clippy is reportedly clean, there must be an #[allow(dead_code)] somewhere (or it's pub so the compiler doesn't warn). Minor, but if this variant is truly reserved-for-future and not tested, a // TODO or a #[doc(hidden)] might communicate the intent more clearly than leaving it silently unreachable.

  • src/ado/discovery.rs:classify_definitionDirect vs Consumer mislabelling edge case.
    is_direct_match uses the heuristic: exactly 1 marker + yamlFilename == stem + ".lock.yml". A hand-crafted pipeline that (a) includes exactly one ado-aw template and (b) happens to share the same root filename as that template's lock file would be labelled Direct instead of Consumer. In practice this is purely cosmetic — both statuses are actionable and reach discovered_to_matched — but the classification would be misleading in the CLI summary. Low probability scenario, worth a comment in is_direct_match acknowledging the cosmetic impact.

  • src/ado/discovery.rs — large-project task explosion.
    handles.push(tokio::spawn(...)) is called for every filtered definition before any handle is awaited. The semaphore correctly limits concurrent HTTP calls to 8, but in a project with thousands of definitions (e.g., msazuresphere/4x4), all N tokio tasks are spawned upfront. Tasks are cheap but not free; worth a comment noting the known trade-off. If N could be very large, an alternative is a stream-based approach (FuturesUnordered with bounded concurrency). Not urgent at current project scales.


✅ What Looks Good

  • Security coverage is thorough. ##vso[ injection is neutralised at both the echo line in the marker step (neutralize_pipeline_commands + bash_single_quote_escape) and at the summary output path in discovered_to_matched. Both have specific regression tests.

  • serde_json::json! for the metadata blob — correctly hands JSON-level escaping to serde rather than manually embedding the source path, preventing double-escaping. The json_marker_quote_in_source_round_trips_correctly test explicitly catches the previous regression.

  • normalize_repo_url percent-decodes before case-folding — robustly handles both ADO-encoded (My%20Project) and decoded forms in URL comparison.

  • parse_marker_step is forward-compatible — missing/unknown JSON fields are silently ignored (no #[serde(deny_unknown_fields)]), and the test_parse_marker_step_tolerates_extra_json_fields test confirms it.

  • marker_origin_matches lenient-for-empty — correctly falls back to source-only matching for legacy markers (empty org/repo) so pre-marker pipelines remain discoverable during the transition window.

  • Error propagationanyhow::Result throughout, .context() on every HTTP call, descriptive bail! messages at CLI boundary. No user-facing unwrap() outside provably-safe cases.

  • 12 unit tests for pure discovery logic, 6 for parse_marker_step, 4 integration tests for the marker step across all four targets — good coverage depth.

Generated by Rust PR Reviewer for issue #624 · ● 1.1M ·

The marker extension was originally wired via `setup_steps` but the
`refactor(marker)` commit moved emission to `prepare_steps` so a
metadata-only pipeline doesn't spawn a Setup pool agent. Several doc
comments still describe the marker as a Setup-job step, which is
misleading for anyone tracing the code from a comment.

* `src/compile/extensions/ado_aw_marker.rs` — module-level and
  struct-level comments now describe the Agent-job prepare-phase
  injection (and explain why `prepare_steps` was chosen over
  `setup_steps`).
* `src/detect.rs` — `MARKER_STEP_PREFIX` doc updated to say
  "injected Agent-job prepare step" instead of "Setup-job step".
* `src/ado/discovery.rs` — the `markers.len() > 1` rationale in
  `is_direct_match` now says "its own Agent-job prepare step".

PR description on #624 also corrected via `gh pr edit` to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesadevine jamesadevine merged commit 6d4c824 into main May 19, 2026
5 checks passed
@jamesadevine jamesadevine deleted the devinejames/secret-setting branch May 19, 2026 14:18
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid implementation — well-structured, good security hygiene, and comprehensive test coverage. No blocking issues found.

Findings

✅ What Looks Good

  • is_direct_match strict equality: The earlier ends_with("/{stem}") tail-match is correctly replaced with strict equality after normalize_ado_yaml_path. The regression test (consumer_when_same_stem_in_different_directory) explicitly covers the false-positive case that motivated the change.

  • ##vso[ injection in echo step: The injected marker step correctly applies neutralize_pipeline_commands to the echo's user-controlled fields (source, org, repo) and bash_single_quote_escape for bash quoting safety. The JSON comment line is correctly left unescaped since ADO's logging-command scanner only fires on runtime stdout, not on comment lines rendered from the step body.

  • normalize_source_path double-escape fix: Removing replace('"', "\\\"") from the shared helper and letting serde_json::json!() handle JSON-level escaping is the right fix. generate_header_comment correctly applies its own inline replace('"', "\\\"") for the YAML-comment surface. The round-trip test (json_marker_quote_in_source_round_trips_correctly) closes the regression.

  • Concurrency guard: The ADO_AW_PREVIEW_CONCURRENCY=0 deadlock guard with an explicit warn! (rather than silent .max(1)) is the right call — it makes the typo debuggable.

  • clap conflicts_with: --all-repos and --source both carry conflicts_with = "definition_ids", so the claimed mutual exclusivity is enforced at parse time rather than silently resolved in code.

  • TOCTOU race (definition deleted mid-discovery): Correctly tracked as NotFound (distinct from PreviewFailed) so it's excluded from the "skip warning" count without requiring operator action.

  • Legacy path preserved: resolve_for_command only activates discovery when definition_ids.is_none() && (all_repos || source_filter.is_some()), leaving the legacy lexical matcher as the default for all existing call sites.

⚠️ Suggestions

  • src/ado/discovery.rs:144sem.acquire_owned().await.expect("semaphore not closed") inside a tokio::spawn closure: if this expect ever fires (unreachable given the Arc, but defensively), the spawned task panics, handle.await returns Err(JoinError), and the definition is silently dropped with only a warn!("Discovery worker panicked"). No caller would know a definition was skipped. Consider replacing with sem.acquire_owned().await.unwrap_or_else(|_| panic!(...)) or — more robustly — returning an Err from the closure so the outer loop can propagate or surface it more explicitly. Low priority given the semaphore is Arc-wrapped and cannot be closed before the tasks finish, but worth hardening before any future refactor that might move the drop point.

  • src/ado/discovery.rs (resolve_definitions_via_discovery doc) — the doc says --source matching "is case-sensitive even on Windows". This is correct and intentional, but it means a user who passes agents\Foo.md on Windows (wrong case) silently matches nothing, even after normalize_source_path has fixed the separator. The CLI help already calls this out, so this is more of a UX note — consider whether a case-fold fallback with a warning would improve the experience, especially for Windows users piping from Explorer.

  • src/ado/discovery.rs:revision fieldrevision: Option<u64> is deserialized on DefinitionSummary but the doc comment explicitly says "No caching is implemented yet." Consider marking it #[allow(dead_code)] or a TODO comment with an issue reference so the field doesn't quietly accumulate API data that future contributors assume is being used.

Generated by Rust PR Reviewer for issue #624 · ● 1.7M ·

github-actions Bot added a commit that referenced this pull request May 19, 2026
Add the two opt-in project-scope discovery flags introduced in
feat(secrets): #624 to the CLI reference page:

- --all-repos: activates Preview-driven discovery across the entire ADO
  project, not just local lock files
- --source <path>: filters to definitions whose ado-aw-metadata marker
  references the given template path

Also adds a "Project-scope discovery" subsection with usage examples and
updates the common-examples block with an --all-repos one-liner.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant